BACKGROUND

Scenario

I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore:

  • My team wants to understand how casual riders and annual members use Cyclistic bikes differently.
  • My team will design a new marketing strategy to convert casual riders into annual members from these insights.

The Company

About the company

In 2016, Cyclistic launched a successful bike-share oering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. ** Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders **. Although the pricing flexibility helps Cyclistic attract more customers, The director of marketing believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, she believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Objectives

  • To identify differences between Annual Members and casual riders of cyclistic bikes. This will answer the business task question: How do annual members and casual riders use Cyclistic bikes differently?

  • Insights derived from analysis will drive decision making on whether marketing campaigns should be aimed at getting new members or converting casual riders to annual members.

Key Deliverables

  • Business task to be clearly stated
  • A description of all data sources used
  • Documentation of any cleaning or manipulation of data
  • A summary of my analysis
  • Supporting visualizations and key findings
  • My top three recommendations based on your analysis

Metadata

Please note that ** Divvy’s bike trips dataset (Jan-Dec.2021) ** was used for this project. To download this data set, please use this link[https://divvy-tripdata.s3.amazonaws.com/index.html]. It is also important to note that the company name ‘Cyclistic’ is fictional.

Analysis Stages used in this project are:

  • Ask
  • Prepare
  • Process
  • Analyse
  • Share
  • Act

ASK PHASE

Questions to be asked to proceed with this analysis include:

DATA PREPARATION

Data used has to be: * Unbiased * Free from errors (Data was checked for null values and formatted properly) * Original * Reliable * Current ( Latest data was used for this project) * Comprehensive

#loading required libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr) #data wrangling
library(ggplot2) # plotting charts
library(skimr) # to get a detailed info on data
library(readr)
library(tidyr) # for tidy data
library(lubridate) # to format dates
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(geosphere) #to calculate distance in metres between two geographical positions

importing data from file directory

data1=read_csv('202101-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data13=read_csv('202102-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data14=read_csv('202103-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data4=read_csv('202104-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data5=read_csv('202105-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data6=read_csv('202106-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data7=read_csv('202107-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data8=read_csv('202108-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data9=read_csv('202109-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data10=read_csv('202110-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data11=read_csv('202111-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
data12=read_csv('202112-divvy-tripdata.csv')
## 
## -- Column specification --------------------------------------------------------
## cols(
##   ride_id = col_character(),
##   rideable_type = col_character(),
##   started_at = col_datetime(format = ""),
##   ended_at = col_datetime(format = ""),
##   start_station_name = col_character(),
##   start_station_id = col_character(),
##   end_station_name = col_character(),
##   end_station_id = col_character(),
##   start_lat = col_double(),
##   start_lng = col_double(),
##   end_lat = col_double(),
##   end_lng = col_double(),
##   member_casual = col_character()
## )
dim(data1)
## [1] 96834    13
dim(data13)
## [1] 49622    13
dim(data14)
## [1] 228496     13
dim(data4)
## [1] 337230     13
dim(data5)
## [1] 531633     13
dim(data6)
## [1] 729595     13
dim(data7)
## [1] 822410     13
dim(data8)
## [1] 804352     13
dim(data9)
## [1] 756147     13
dim(data10)
## [1] 631226     13
dim(data11)
## [1] 359978     13
dim(data12)
## [1] 247540     13
data=rbind(data1,data13,data14,data4,data5,data6,data7,data8,data9,data10,data11,data12)

nrow(data)
## [1] 5595063
#appending all rows since they have same columns

#detailed info of the data
skim_without_charts(data)
Data summary
Name data
Number of rows 5595063
Number of columns 13
_______________________
Column type frequency:
character 7
numeric 4
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ride_id 0 1.00 16 16 0 5595063 0
rideable_type 0 1.00 11 13 0 3 0
start_station_name 690809 0.88 3 53 0 847 0
start_station_id 690806 0.88 3 36 0 834 0
end_station_name 739170 0.87 10 53 0 844 0
end_station_id 739170 0.87 3 36 0 832 0
member_casual 0 1.00 6 6 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
start_lat 0 1 41.90 0.05 41.64 41.88 41.90 41.93 42.07
start_lng 0 1 -87.65 0.03 -87.84 -87.66 -87.64 -87.63 -87.52
end_lat 4771 1 41.90 0.05 41.39 41.88 41.90 41.93 42.17
end_lng 4771 1 -87.65 0.03 -88.97 -87.66 -87.64 -87.63 -87.49

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
started_at 0 1 2021-01-01 00:02:05 2021-12-31 23:59:48 2021-08-01 01:52:11 4677998
ended_at 0 1 2021-01-01 00:08:39 2022-01-03 17:32:18 2021-08-01 02:21:55 4671372

DATA PROCESSING

Tidy data to be readily available for analysis

#Taking out null values to prevent bias in data
data2<-drop_na(data)
nrow(data2)
## [1] 4588302
#checking data for input errors and inconsistent formats
#ensuring datetime is of the same format in the datetime columns
data2$started_at <- ymd_hms(data2$started_at)
data2$ended_at <- ymd_hms(data2$ended_at)

#check for input errors in character columns using str_length and unique functions
unique(data2$rideable_type)
## [1] "classic_bike"  "electric_bike" "docked_bike"
max(str_length(data$ride_id))
## [1] 16
min(str_length(data$ride_id))
## [1] 16
#adding new columns that will be needed for the analysis later
#calculating distance in metres
data2<-data2 %>%  mutate(trip_distance=distGeo(matrix(c(data2$start_lng,data2$start_lat), ncol = 2), matrix(c(data2$end_lng, data2$end_lat), ncol = 2)))

#measuring difference between trip start time and end time in secs
data2$triptime_in_secs <- as.numeric(difftime(data2$ended_at, data2$started_at, units ="secs"))
str(data2)
## tibble [4,588,302 x 15] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4588302] "B9F73448DFBE0D45" "457C7F4B5D3DA135" "57C750326F9FDABE" "4D518C65E338D070" ...
##  $ rideable_type     : chr [1:4588302] "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:4588302], format: "2021-01-24 19:15:38" "2021-01-23 12:57:38" ...
##  $ ended_at          : POSIXct[1:4588302], format: "2021-01-24 19:22:51" "2021-01-23 13:02:10" ...
##  $ start_station_name: chr [1:4588302] "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
##  $ start_station_id  : chr [1:4588302] "17660" "17660" "17660" "17660" ...
##  $ end_station_name  : chr [1:4588302] "Wood St & Augusta Blvd" "California Ave & North Ave" "Wood St & Augusta Blvd" "Wood St & Augusta Blvd" ...
##  $ end_station_id    : chr [1:4588302] "657" "13258" "657" "657" ...
##  $ start_lat         : num [1:4588302] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:4588302] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num [1:4588302] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:4588302] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr [1:4588302] "member" "member" "casual" "casual" ...
##  $ trip_distance     : num [1:4588302] 2038 1114 2038 2041 2038 ...
##  $ triptime_in_secs  : num [1:4588302] 433 272 587 537 609 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
#filtering out trips less than or equal to 0 secs and trips greater than 86400 secs(a day) to prevent bias in analysis
data3<-data2 %>% filter(!(triptime_in_secs<=0 | data2$triptime_in_secs>86400))
dim(data3)
## [1] 4586829      15
#extract month and day from the started_at column
#convert datetime column to date first
data3$trip_date <- as.Date(data3$started_at)
head(data3)
#extract day of week
data3$trip_day <- weekdays(data3$trip_date)                
#extract month of the year
data3$trip_month<-strftime(data3$trip_date, '%b')
head(data3)
#order by first day of the week else it will be sorted in alphabetical order
 data3$trip_day<-factor(data3$trip_day, levels= c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

data3[order(data3$trip_day), ]
#order by month
data3$trip_month<-factor(data3$trip_month, levels= c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul","Aug","Sep","Oct","Nov","Dec"))

data3[order(data3$trip_month), ]
head(data3)
glimpse(data3)
## Rows: 4,586,829
## Columns: 18
## $ ride_id            <chr> "B9F73448DFBE0D45", "457C7F4B5D3DA135", "57C7503...
## $ rideable_type      <chr> "classic_bike", "electric_bike", "electric_bike"...
## $ started_at         <dttm> 2021-01-24 19:15:38, 2021-01-23 12:57:38, 2021-...
## $ ended_at           <dttm> 2021-01-24 19:22:51, 2021-01-23 13:02:10, 2021-...
## $ start_station_name <chr> "California Ave & Cortez St", "California Ave & ...
## $ start_station_id   <chr> "17660", "17660", "17660", "17660", "17660", "17...
## $ end_station_name   <chr> "Wood St & Augusta Blvd", "California Ave & Nort...
## $ end_station_id     <chr> "657", "13258", "657", "657", "657", "KA15040001...
## $ start_lat          <dbl> 41.90036, 41.90041, 41.90037, 41.90038, 41.90036...
## $ start_lng          <dbl> -87.69670, -87.69673, -87.69669, -87.69672, -87....
## $ end_lat            <dbl> 41.89918, 41.91044, 41.89918, 41.89915, 41.89918...
## $ end_lng            <dbl> -87.67220, -87.69689, -87.67218, -87.67218, -87....
## $ member_casual      <chr> "member", "member", "casual", "casual", "casual"...
## $ trip_distance      <dbl> 2037.5917, 1114.0491, 2038.2011, 2040.8390, 2037...
## $ triptime_in_secs   <dbl> 433, 272, 587, 537, 609, 1233, 360, 268, 1103, 1...
## $ trip_date          <date> 2021-01-24, 2021-01-23, 2021-01-09, 2021-01-09,...
## $ trip_day           <fct> Sunday, Saturday, Saturday, Saturday, Sunday, Fr...
## $ trip_month         <fct> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan...
#trip day and month have been ordered and are now factors
# lets change the column name member_casual to something more descriptive
data3<-data3 %>% rename(membership_type=member_casual)

DATA ANALYSIS

Analysis will entail the following to draw necessary insights: * Number of rides taken by each membership type monthly * Number of rides taken per membership type per day of week * Average distance travelled by each membership type per month * Average distance travelled per day of week per membership type * Average time spent cycling by members and casual riders per day of week * Average time spent cycling by members and casual riders per month * Mostly used bike in terms of number of rides * Mostly used bike in the context of average distance travelled * Total number of rides per month

rides_per_day <- data3 %>%
  group_by(membership_type, trip_day) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(trip_day) %>% 
  tidyr::spread(key = membership_type,value = number_of_rides)

print(rides_per_day)
## # A tibble: 7 x 3
##   trip_day  casual member
##   <fct>      <int>  <int>
## 1 Sunday    403452 311210
## 2 Monday    228781 346474
## 3 Tuesday   214819 388118
## 4 Wednesday 218013 397679
## 5 Thursday  224082 373466
## 6 Friday    289863 365773
## 7 Saturday  468033 357066
# From the analysis above, casual riders utilize Cyclistic bikes mostly on weekends, hence, the #number for these riders while annual members ride mostly on weekdays with a steady increase all #through the week.

rides_per_month <- data3 %>%
  group_by(membership_type, trip_month) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(trip_month) %>% 
  tidyr::spread(key = membership_type,value = number_of_rides)
print(rides_per_month)
## # A tibble: 12 x 3
##    trip_month casual member
##    <fct>       <int>  <int>
##  1 Jan         14675  68818
##  2 Feb          8592  34379
##  3 Mar         75551 130045
##  4 Apr        120310 177779
##  5 May        216608 234152
##  6 Jun        303930 304577
##  7 Jul        369207 322892
##  8 Aug        341356 332911
##  9 Sep        292821 328183
## 10 Oct        189029 288851
## 11 Nov         69923 185906
## 12 Dec         45041 131293
#OBSERVATION
#on a monthly basis, number of rides for members exceeded that of casual riders except for months #July and August where no of rides by casual riders exceeded members by 12.5% and 2.4% respectively


#total number of rides by each membership type

number_per_membership <- data3 %>%
  group_by(membership_type) %>%
  summarize(number_of_rides = n() , .groups = 'drop') %>% 
  tidyr::spread(key = membership_type,value = number_of_rides) 

#Overall rides by members exceeded casual riders by 10.7%

#Let's see by what percentage mmembers rides surpassed casual riders 
temptable <- data3 %>%
  group_by(membership_type) %>%
  summarize(number_of_rides = n() , .groups = 'drop') %>% 
  tidyr::spread(key = membership_type,value = number_of_rides) %>% 
  summarise(ratio_to_m=((member-casual)/(member+casual)*100))

monthly_avg_trip_distance <- data3 %>%
  group_by(membership_type, trip_month) %>%
  summarise(average_trip_dist = mean(trip_distance), .groups = 'drop') %>%
  arrange(trip_month) %>% 
  tidyr::spread(key = membership_type,value = average_trip_dist)
print(monthly_avg_trip_distance)
## # A tibble: 12 x 3
##    trip_month casual member
##    <fct>       <dbl>  <dbl>
##  1 Jan         1921.  1922.
##  2 Feb         2016.  1947.
##  3 Mar         2047.  2103.
##  4 Apr         2048.  2143.
##  5 May         2133.  2184.
##  6 Jun         2187.  2195.
##  7 Jul         2218.  2180.
##  8 Aug         2244.  2140.
##  9 Sep         2266.  2105.
## 10 Oct         2193.  1976.
## 11 Nov         2003.  1863.
## 12 Dec         1930.  1858.
#On the average,there is marginal difference between distance covered by casual riders and members #per month. In the month of January, the average distance travelled by both membership types were the #same.

#distance traveled per day of the week per membership type

avg_dist_per_weekday <- data3 %>%
  group_by(membership_type, trip_day) %>%
  summarise(avg_trip_dist = mean(trip_distance), .groups = 'drop') %>%
  arrange(trip_day) %>% 
  tidyr::spread(key = membership_type,value = avg_trip_dist)
print(avg_dist_per_weekday)
## # A tibble: 7 x 3
##   trip_day  casual member
##   <fct>      <dbl>  <dbl>
## 1 Sunday     2244.  2187.
## 2 Monday     2066.  2039.
## 3 Tuesday    2094.  2054.
## 4 Wednesday  2119.  2066.
## 5 Thursday   2130.  2047.
## 6 Friday     2166.  2052.
## 7 Saturday   2282.  2187.
avg_ridetime_per_weekday <- data3 %>%
  group_by(membership_type, trip_day) %>%
  summarise(avg_ride_time = mean(triptime_in_secs), .groups = 'drop') %>%
  arrange(trip_day) %>% 
  tidyr::spread(key = membership_type,value = avg_ride_time)
print(avg_ridetime_per_weekday)
## # A tibble: 7 x 3
##   trip_day  casual member
##   <fct>      <dbl>  <dbl>
## 1 Sunday     1942.   911.
## 2 Monday     1724.   763.
## 3 Tuesday    1552.   743.
## 4 Wednesday  1463.   747.
## 5 Thursday   1446.   741.
## 6 Friday     1571.   767.
## 7 Saturday   1828.   888.
#Casual riders spend more ride time than members on the average per weekday

#average ride time(in secs) per month

avg_ridetime_per_month <- data3 %>%
  group_by(membership_type, trip_month) %>%
  summarise(avg_ride_time = mean(triptime_in_secs), .groups = 'drop') %>%
  arrange(trip_month) %>% 
  tidyr::spread(key = membership_type,value = avg_ride_time)
print(avg_ridetime_per_month)
## # A tibble: 12 x 3
##    trip_month casual member
##    <fct>       <dbl>  <dbl>
##  1 Jan         1337.   722.
##  2 Feb         1863.   882.
##  3 Mar         1933.   819.
##  4 Apr         1926.   855.
##  5 May         1987.   860.
##  6 Jun         1845.   848.
##  7 Jul         1707.   827.
##  8 Aug         1625.   812.
##  9 Sep         1572.   788.
## 10 Oct         1461.   721.
## 11 Nov         1211.   656.
## 12 Dec         1208.   635.
#creating this temp table to see how much of total time each membership type takes per month

temp1 <- avg_ridetime_per_month %>% 
  group_by(trip_month) %>% 
  summarise(total=sum(casual,member),ratio_to_total_c=(casual/total)*100,ratio_to_total_m=(member/total)*100)
print(temp1)
## # A tibble: 12 x 4
##    trip_month total ratio_to_total_c ratio_to_total_m
##    <fct>      <dbl>            <dbl>            <dbl>
##  1 Jan        2059.             64.9             35.1
##  2 Feb        2744.             67.9             32.1
##  3 Mar        2752.             70.2             29.8
##  4 Apr        2780.             69.3             30.7
##  5 May        2847.             69.8             30.2
##  6 Jun        2692.             68.5             31.5
##  7 Jul        2535.             67.4             32.6
##  8 Aug        2438.             66.7             33.3
##  9 Sep        2359.             66.6             33.4
## 10 Oct        2181.             67.0             33.0
## 11 Nov        1866.             64.9             35.1
## 12 Dec        1843.             65.5             34.5
#on the average, casual riders ridetime is over 100% higher than that for members.

#for seeing the distance travelled for each bike type

dist_travelled_per_bike<-data3 %>%
  group_by(rideable_type,membership_type) %>%
  summarise(distance_of_ride = mean(trip_distance), .groups = 'drop') %>%
  arrange(rideable_type)



#Electric bikes were used for longer hours by both membership types while docked bikes were the least utilized.
#appears to be that annual members were not positively disposed to docked bikes.

#number of times each bike type was used-frequency of usage
frequency_per_bike <- data3 %>%
  group_by(rideable_type,membership_type) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(rideable_type) %>% 
  tidyr::spread(key = membership_type,value = number_of_rides)

#classic bikes were the most used by both membership types; although members used it more than casual riders
#Although classic bikes were more frequently used, electric bikes covered longer distances than #classic bikes

# members used electric and classic bikes more and rarely used docked bikes. Could it be that #thismayhave accounted for the shorter ride time for members? Recall that casual riders had over a #100% longer ride time than annual members.

#which day had the highest number of rides?

day_with_most_rides <- data3 %>%
  group_by(trip_day) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(number_of_rides) 

#Sunday had the highest number of rides, followed closely by Saturday owing the the surge in the #number of rides taken by casual riders during weekends.

SHARE

I will be sharing results of my analysis using charts from the ** ggplot2 ** package.

#plotting a bar chart to show difference in number of rides taken by each membership type per month

chart1<-data3 %>%
  group_by(membership_type, trip_month) %>%
  summarise(number_of_trips = n(), .groups = 'drop') %>%
  ggplot(aes(x=trip_month,y=number_of_trips,fill=membership_type, width=.75))+ geom_bar(position="dodge",stat="identity")+
  geom_text(aes(label=number_of_trips), vjust = -0.25, size = 2, position=position_dodge(width=0.9))+
  scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Total Number of Rides Taken Monthly by Casual Riders and Members',y="Number of Rides",
       caption="Data from Jan to Dec.2021")

#plotting number of rides by weekdays
  
 chart_2<- data3 %>%
    group_by(membership_type, trip_day) %>%
    summarise(number_of_trips = n(), .groups = 'drop') %>%
    ggplot(aes(x=trip_day,y=number_of_trips,fill=membership_type))+ geom_bar(position="dodge",stat="identity")+
  geom_text(aes(label=number_of_trips), vjust = -0.25, size = 2, position=position_dodge(width=0.75))+
    scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Total Number of Rides Taken on weekdays by Casual Riders and Members',y="Number of Rides")
  
    #plotting average distance covered by each membership type in a weekday
  data3 %>%
    group_by(membership_type, trip_day) %>%
    summarise(avg_trip_dist = mean(trip_distance), .groups = 'drop') %>%
    arrange(trip_day) %>% ggplot(aes(x=trip_day,y=avg_trip_dist,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+
    scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Average Trip Distance Taken on weekdays by Casual Riders and Members',
       caption='Data from Jan to Dec 2021')

     #plotting average distance covered by each membership type in a month
  data3 %>%
    group_by(membership_type, trip_month) %>%
    summarise(avg_trip_dist = mean(trip_distance), .groups = 'drop') %>%
    arrange(trip_month) %>% ggplot(aes(x=trip_month,y=avg_trip_dist,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+
     scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Average Trip Distance Taken Per Month By Casual Riders and Members',
       caption='Data from Jan to Dec 2021')

  #average time taken by each membership type per day of week
  
data3 %>%
    group_by(membership_type, trip_day) %>%
    summarise(avg_trip_time = mean(triptime_in_secs), .groups = 'drop') %>%
    arrange(trip_day) %>% ggplot(aes(x=trip_day,y=avg_trip_time,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+
     scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Average Trip Time( )in secs) Per Month By Casual Riders and Members',
       caption='Data from Jan to Dec. 2021')

   #average time taken by each membership type per month
  
data3 %>%
    group_by(membership_type, trip_month) %>%
    summarise(avg_trip_time = mean(triptime_in_secs), .groups = 'drop') %>%
    arrange(trip_month) %>% ggplot(aes(x=trip_month,y=avg_trip_time,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+
     scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Comparison of Average Trip Time(in secs) Per Month By Casual Riders and Members',
       caption='Data from Jan to Dec. 2021')

 #Plotting a chart to show frequency of usage of each bike by membership types per weekday
 
data3 %>%
  group_by(rideable_type,membership_type,trip_day) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(rideable_type) %>% ggplot(aes(x=trip_day,y=number_of_rides,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))+
     scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Frequency of Usage of Each Bike Type by Annual Members and Casual Riders',
       caption='Data from Jan to Dec. 2021')+facet_wrap(~rideable_type)+
   theme(panel.spacing = unit(1, "lines"))

  #Plotting a chart to show frequency of usage of each bike by membership types per month
 
data3 %>%
  group_by(rideable_type,membership_type,trip_month) %>%
  summarise(number_of_rides = n(), .groups = 'drop') %>%
  arrange(rideable_type) %>% ggplot(aes(x=rideable_type,y=number_of_rides,fill=membership_type))+
    geom_col(position=position_dodge(width=0.75))+theme(axis.text.x  = element_text(angle=-90, hjust=0.5, size=11,colour="black"))+
     scale_fill_manual("membership_type", values = c("casual" = "orange", "member" = "blue"))+
  labs(title='Cyclistic: Monthly Frequency of Usage of Each Bike Type by Annual Members and Casual Riders',
       caption='Data from Jan to Dec. 2021')

OBSERVATIONS AND INSIGHTS

  • Casual riders ride for longer hours (about 100%) than annual members all through the year.
  • Overall, number of rides for the annual members exceed casual riders by 11%
  • On a weekly basis, casual riders utilize weekends more often, hence there is a spike in the number of rides taken by casual riders on weekends. However, annual members ride more often during weekdays and the number of rides on weekdays had a steady increase.
  • Variation in average distance covered by both membership types is marginal or negligible.
  • Highest number of rides for casual members was on Saturdays (468,033), while annual members had most rides on Wednesdays (397,679).
  • classic bikes were the most used by both membership types; although members used it more than casual riders (36%)
  • Although classic bikes were more frequently used, electric bikes covered longer distances than classic bikes.
  • Sunday had the highest number of rides, followed closely by Saturday owing the the surge in the number of rides taken by casual riders during weekends. *members used electric and classic bikes more and rarely used docked bikes. Could it be that this may have accounted for the shorter ride time for members? Recall that casual riders had over a 100% longer ride time than annual members.
  • Month of May had the longest ride time(in secs) for both membership types: Casual (1987 secs) and Member (860 secs).

ACT PHASE

My recommendations are presented below:

Thank you.